Setting up a Data from the FIA Database

The FIA Database

The Forest Inventory Analysis database is a long-term database for trees across the United States. It consists of plots that were sampled and all trees over a certain size were measured for DBH and their heights estimated.

Search the web to find the FIA data web pages and especially the documentation. You'll find that you can download the data as MS-Access files or CSV files. The data can be combersome so we've loaded a subset of the data into a single database you can access over our intranet.

The FIA dataset is one of the best documented databases in the natural resource world. Spend some time taking a look at the tables and fields that make up the database. Save this document as you'll need to reference it over the next few weeks as we use the data.

Acessing the Database

We've setup an enterprise level spatial database for you to access various datasets. The database is on a server in an envrionmentally controlled space, however, you can access it through our intranet. Notice I used our "intranet" rather than the "Internet". The database is only available from HSU computers to prevent breakins from the Internet.

The database is implemented with the software PostgreSQL. PostgreSQL is an open source database. It can manage very large databases with simulaneous access from multiple users (i.e. it is a real, enterprise-level, relational database). To access the server, you'll need to run some client software and then provide an IP address and password.

Open "pgAdmin" on your computer and select "File -> Add Server...". Make the "Name" "HSU_GIS_HUB", the "Host" "137.150.44.18", and enter the password provided by the instructor.

Warning: Do not provide this password to anyone outside class. The could access the database and destroy it for our work and the work of others!

When you click "OK", the server name should appear below "Servers". However, there are a number of errors that can occur. You may see a error that pgAmin cannot access the database. If this occurs, try again and if it still fails, make sure you can access the Internet. If you still have problems, let the instructor know.

The pgAmin is a graphicial user interface that uses Transactional SQL (TSQL) to communicate with the database. TSQL includes SQL and a number of commands to manage a database. We'll by using "SELECT" commands to obtain data.

To get started, click on the "+" next to the "GIS_HUB" server. Then click on the "+" next to "Databases". This will show you the databases contained within the database server. Click on the "+" next to "FIA", then click on the "+" next to "Schemas" and then the "+" next to "Tables". You should see the tables for the FIA database that we have imported into the database.

Right click on the "tree" table and select "View Data -> View Top 100 Rows". What do you see? You can use the names at the top of the query results and the FIA docuemntation to find out what each column means. To create this view, PostgreSQL has used a "SELECT" statement to the database. This view is quick and handy but it will not let us get all the data we need.

Close the view and click on the "SQL" button (looks like a magnafying glass). The window that opens allows you to write queries to the database. Note that the title of the window includes the database we are communicating with, "FIA". Type the following and press the green arrow at the top of the window.

SELECT * 
FROM tree

Nitice that we've typed the SQL commands as all caps and the table names in small letters. This is convension that will make your queries easier to read. You'll see similar data appear that you saw before as we have just queried all the rows from the "tree" table. It did take a while to for the database to retrieve this data so lets limit the number of rows (it it takes too long, you can stop the query with the stop button that appears next to the play button):

SELECT * 
FROM tree
LIMIT 100

The query will execute much faster now. Use the "LIMIT" command from now on when you are quering large tables to keep your queries fast, especially as we'll have the entire class executing them at the same time.

Let's find out how many rows the table has using the "count(*)" command:

SELECT count(*) 
FROM plot

You may need to size the column in the "Data Output" panel to see that there are almost a million plots in the database! This is too many for ArcMap to deal with to lets limit it to just California (BlueSpray will do OK with this many and R will be fine).

SELECT count(*) 
FROM plot 
WHERE state_code='CA'

"State_Code" is a column that we added when we imported the data from the individual files so that we could either to state-level work or country wide work. Now let's add this to our "plot" query and check that all the "state_code" values are "CA".

SELECT * 
FROM plot
WHERE state_code='CA'
LIMIT 100

This gets us started but we also want to get data on specific trees. Join the "tree" table using the primary and foreign keys:

SELECT * 
FROM tree
	INNER JOIN plot ON plot.cn=tree.plt_cn
WHERE plot.state_code='CA'
LIMIT 100

At this point you'll notice that we are getting a lot of fields from both tables. Let's limit the number of fields to just those that we are interested in.

SELECT tree.ht,tree.dia,plot.lat,plot.lon,plot.elev, 	
	plot.measyear,plot.measmon,plot.measday 
FROM tree 
	INNER JOIN plot ON plot.cn=tree.plt_cn 
WHERE plot.state_code='CA' 
LIMIT 100

You may have noticed that we are missing the species of tree. This is in the "ref_species" table and can be added through the "spcd" fields. Then, we can get the "common_name" and other names for the species as desired.

SELECT tree.ht,tree.dia,plot.lat,plot.lon,plot.elev, 	 	
	plot.measyear,plot.measmon,plot.measday,common_name 
FROM tree  
	INNER JOIN plot ON plot.cn=tree.plt_cn  
	INNER JOIN ref_species ON ref_species.spcd=tree.spcd
WHERE plot.state_code='CA'        
LIMIT 100

Before continuing, add the tree diameter, genus, species, plot elevation, and plot date fields to the query and check if the results look good. We also want to limit the species to 'Douglas-fir'.

Note that this dataset will not contain any records for plots that do not contain Douglas fir trees because the "INNER JOIN" will only keep records that have matching enries in tree, plot, and ref_species. The dataset will contain mulitiple entries for each plot which may not be as desirable as the maximum height (or diameter) tree for each plot. Add functions and a "GROUP BY" statement to make the query like the following.

Note that when we "GROUP" records, we have to add a function to aggregate the fields that are not in the "GROUP" statement. You'll also want to add an "AS" command to rename the results or they will be named "max", "max", "min", etc.

Note: We've also remove the "LIMIT" command at this point so we get all the records.

SELECT max(tree.ht),max(tree.dia),min(plot.lat),min(plot.lon),min(plot.elev), 
 min(plot.measyear),min(plot.measmon),min(plot.measday) 
FROM tree 
 	INNER JOIN plot ON plot.cn=tree.plt_cn 
 	INNER JOIN ref_species ON ref_species.spcd=tree.spcd
 WHERE plot.state_code='CA' and ref_species.common_name LIKE 'Douglas-fir'
 GROUP BY (tree.plt_cn)

Save this query and the data from it as we will use it to model Douglas-Fir trees. A good name for the data would be "CA_plots_doug_fir.csv".

Querying Data From the Forest Inventory Analysis Database

We'll use the query above to model the maximum height of Doug-firs in plots that contain Douglas-fir trees. We'd also like the try some presence-absence modeling so we need a dataset with all the plots and a 0 for the ones that do not contain Douglas-Fir trees and a 1 for the plots that do.

First, we need to create a query that will return the plots that do contain Douglas-Fir trees in California.

SELECT plt_cn 
FROM tree
       INNER JOIN ref_species ON ref_species.spcd=tree.spcd
WHERE ref_species.common_name LIKE 'Douglas-fir'
GROUP BY plt_cn

Then we need to insert this query into another query that selects the plots who's "cn" numbers appear in the "sub-query". This results in a dataset with information on all the plots where Douglas-Fir trees are present.

SELECT lat,lon,elev,measyear,measmon,measday 
FROM plot
WHERE state_code='CA' AND plot.cn IN (
	SELECT plt_cn 
	FROM tree
		INNER JOIN ref_species ON ref_species.spcd=tree.spcd
	WHERE ref_species.common_name LIKE 'Douglas-fir'
	GROUP BY plt_cn
)

Formatting the SQL query they way I did above is not required but will help you read the command. Use "File->Export" to save this result to a "CSV" file. Now, we just change the "IN" to "NOT IN" and we can get the plots where Douglas-Firt trees are absent:

SELECT lat,lon,elev,measyear,measmon,measday 
FROM plot
WHERE state_code='CA' AND plot.cn NOT IN  (
       SELECT plt_cn FROM tree
       	INNER JOIN ref_species ON ref_species.spcd=tree.spcd
       	WHERE ref_species.common_name LIKE 'Douglas-fir'
     	GROUP BY plt_cn
)

Use "File->Export" to save this result to a "CSV" file.

Preping the Point Files

Load the "Doug_Fir_Present" file ArcGIS. You'll need to "Export" the file as a shapefile to continue.

Add a "Present" field to each file. Then, set the value of "Present" to 1 for the file with Doug-Fir points. The value should be 0 by default for the other file. Then, Merge the files together. Display the resulting "Doug_Fir_Present" file to show where Doug-Fir is present and where it is absent.

Question 1: Is this data more random, dispersed, or uniform and why?

Question 2: Do you see any potential problems in using this dataset?

Adding Corvaiates

Your corvaiates should be available as raster files with the same spatial reference and cell size. Add your covariates into ArcGIS. Use the "Extract Values to Points" tool to add raster values to the attribute table of the point file.

Note: If you get an error it may because you did not export the data as a Shapefile.

Clean Up

It's always a good idea (and often critical) to check your data and clean up any problems. Load the "dbf" from your shapefile into Excel. Then, save the file as a "CSV" so you won't be changing the attribute table for your shapefile (this will make it impossible to load into ArcMap). Then, sort the data by each row in the column in both directions and delete and rows that have problems. Make a note of the number of rows you deleted.

Some of the problems you'll find are lat and/or lon that are zeros, raster values that are 0, and -9999 values which are "NoData" values from ArcGIS. The "elev" values can be 0 (sea level).

Geospatial Activities

Setting up a Data from the FIA Database

The FIA Database

Acessing the Database

Querying Data From the Forest Inventory Analysis Database

Preping the Point Files

Adding Corvaiates

Clean Up

Contact Info

HSU Geospatial Sites

Geospatial Information

OLM Archives